The actor model of computation has been designed for a seamless support ofconcurrency and distribution. However, it remains unspecific about dataparallel program flows, while available processing power of modern many corehardware such as graphics processing units (GPUs) or coprocessors increases therelevance of data parallelism for general-purpose computation. In this work, we introduce OpenCL-enabled actors to the C++ Actor Framework(CAF). This offers a high level interface for accessing any OpenCL devicewithout leaving the actor paradigm. The new type of actor is integrated intothe runtime environment of CAF and gives rise to transparent message passing indistributed systems on heterogeneous hardware. Following the actor logic inCAF, OpenCL kernels can be composed while encapsulated in C++ actors, henceoperate in a multi-stage fashion on data resident at the GPU. Developers arethus enabled to build complex data parallel programs from primitives withoutleaving the actor paradigm, nor sacrificing performance. Our evaluations oncommodity GPUs, an Nvidia TESLA, and an Intel PHI reveal the expected linearscaling behavior when offloading larger workloads. For sub-second duties, theefficiency of offloading was found to largely differ between devices. Moreover,our findings indicate a negligible overhead over programming with the nativeOpenCL API.
展开▼
机译:设计了参与者模型以无缝支持并发和分发。但是,它对数据并行程序流仍然不确定,而现代许多核心硬件(例如图形处理单元(GPU)或协处理器)的可用处理能力增加了通用计算的数据并行性的相关性。在这项工作中,我们向C ++ Actor Framework(CAF)介绍了启用OpenCL的actor。这提供了用于访问任何OpenCL设备的高级界面,而无需离开actor范例。新型的actor集成到CAF的运行时环境中,并在异构硬件上引起透明消息传递分布式系统中的消息。遵循CAF中的参与者逻辑,可以将OpenCL内核封装在C ++参与者中而构成它们,从而以多阶段方式对驻留在GPU上的数据进行操作。因此,开发人员可以从原语构建复杂的数据并行程序,而不必离开参与者的范式,也不会牺牲性能。我们对商品GPU,Nvidia TESLA和Intel PHI的评估揭示了在卸载较大的工作负载时预期的线性缩放行为。对于亚秒级任务,发现设备之间的卸载效率差异很大。而且,我们的发现表明与使用nativeOpenCL API进行编程相比,开销可忽略不计。
展开▼